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Cosmological parameter uncertainties are often stated assuming a particular model, neglecting the 
model uncertainty, even when Bayesian model selection is unable to identify a conclusive best model. 
Bayesian model averaging is a method for assessing parameter uncertainties in situations where there 
is also uncertainty in the underlying model. We apply model averaging to the estimation of the 
parameters associated with the primordial power spectra of curvature and tensor perturbations. 
We use CosmoNest and MultiNest to compute the model evidences and posteriors, using cosmic 
microwave data from WMAP, ACBAR, BOOMERanG and CBI, plus large-scale structure data 
from the SDSS DR7. We find that the model-averaged 95% credible interval for the spectral index 
using all of the data is 0.940 < n s < 1.000, where n B is specified at a pivot scale 0.015 Mpc -1 . For 
the tensors model averaging can tighten the credible upper limit, depending on prior assumptions. 

PACS numbers: 98.80.-k 



I. INTRODUCTION 

Measurements of the cosmic microwave background 
(CMB) by the Wilkinson Microwave Anisotropy Probe 
(WMAP) have, over the last few years [J-ll], produced 
ever more refined constraints on the cosmological param- 
eters, particularly those relating to the spectrum of pri- 
mordial perturbations. Typically, when constraints are 
quoted this is done under the assumption that a par- 
ticular underlying model is the correct one, but there 
remains some uncertainty as to which is the most appro- 
priate model to fit to the data. One approach to this is 
model selection, which asks the data to rank the models 
under consideration, but current applications 0-Q indi- 
cate that several models are still allowed. 

Such uncertainty in the correct choice of model can 
be handled by the technique of Bayesian model averag- 
ing flfjj , which allows one to assess parameter uncertain- 
ties in the presence of model uncertainty. The individual 
posteriors from different models will contribute to the 
model-averaged posterior, weighted by their model like- 
lihood. Our aim in this paper is to apply this technique 
to measurements of the primordial spectrum. In Section 
im we review the Bayesian ideas we are applying to mea- 
surements of the primordial power spectrum. In Section 
Mil we carry out the analysis. 



II. MULTI-MODEL BAYESIAN STATISTICS 

Baycs' theorem states the relationship between models 
(A4), parameters (9) and data (D) 

P(S\D M) = nD\lM)Pjd\M) 
1 1 ' ' P(D\M) ' 1 ' 

where P(8\A4) is the prior probability distribution of the 
parameters (assuming some model), P(D\8, Ai) is the 



likelihood, and P(6\D, A4) is the posterior probability 
distribution of the parameters. The prior is updated to 
the posterior by the likelihood. The term P(D\A4) rep- 
resents the model likelihood, and is called the evidence. 
In the case of single-model inference (where only a single 
model or set of parameters is considered), the evidence 
is simply a normalizing constant, set to satisfy the con- 
dition that the posterior distribution sums to unity. 

However, in most interesting cases in cosmology, the 
correct model is not known, and the evidence takes dif- 
ferent values for different models. We can use Baycs' 
theorem again, at one level above, to calculate the pos- 
terior odds between different models and perform model 
selection, 



P{Mi\D) P(D\Mi)P(Mi) 



P{M 2 \D) P(D\M 2 )P(Mi 



(2) 



Here Mi and M2 are the different models under consid- 
eration, P(Aii) gives the prior probability of each model, 
and P{Mi\D) is the model posterior probability. Thus 
the model posterior probability is updated from the prior 
by the evidence. If the model priors are equal then the 
ratio of posteriors is simply the ratio of evidences. The 
ratio of evidences is commonly known as the Baycs factor 
B fill, and an interpretation scale was suggested by Jef- 
freys [12[ (though some authors have started to use differ- 
ent language to qualify the different levels, e.g. Ref. 0). 
Many papers have already been written about the use of 
the evidence for cosmological model selection Ell ■ 

The logical procedure would be to first perform a 
model selection analysis to find the best model. Hav- 
ing done so, we would then perform parameter inference 
for the parameters of that single best model. However it 
is possible, even likely, that no model will have decisive 
evidence (In B > 5) over all competing models. If we 
want to include this model uncertainty in the parameter 
posteriors, we could instead produce a model- averaged 
posterior distribution jToj . where the individual posteri- 
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ors from each model are summed together, weighted by 
the model posterior values, 



P(6\D) 



^ k P{e\D,M k )P(M k \D) 
E k P(M k \D) 



(3) 



This model-averaged posterior encodes the uncertainty 
as to the correct model. 1 This model-averaging proce- 
dure has been used before in cosmology [l5| and astro- 
physics/geophysics [l6| . 



III. APPLICATION TO DATA 

The primordial power spectrum of scalar perturbations 
is normally modeled through a modified power-law func- 
tion of wavenumber fc, 



A^(fc) 



A^*) ( f 



(n s -l) + i ln(fc/fc„)n r 



(4) 



where the amplitude is defined at a pivot scale (fc*), n s 
is the spectral index (also known as the tilt), and n run is 
the running of the spectral index. We also refer to A\ at 
the pivot scale as A s . A maximally-symmetric Harrison- 
Zel'dovich (HZ) [l?} model has equal power on all scales, 
so the spectral index is unity and the running is zero. We 
discuss the choice of fc* below. 

Inflation, currently our best model for the generation of 
the spectrum of Gaussian, adiabatic supcrhorizon pertur- 
bations, additionally predicts a spectrum of tensor per- 
turbations, which are also modeled through a power law, 



Al(k) = A 2 h (k*) 



(5) 



where tit is the spectral index of the tensor perturbations 
(the tensor running is normally neglected). Single- field, 
slow-roll inflation predicts a consistency relation between 
the scalar and tensor amplitudes (measured at the same 
scale) in terms of the tensor spectral index, 



AUK 



— r = — 8iit , 



(0) 



and we will enforce this throughout. 

We considered five different models of the spectrum of 
primordial perturbations in this analysis: 

I. A scale-invariant HZ spectrum of scalar perturba- 
tions with no tensor component (n s = 1, r = 0). 

II. A tilted model, where the spectral index is allowed 
to vary, still with no tensors. 



1 Though it may be that in the end the 'true' model is not even one 
that we have considered at the time of the analysis. "Essentially, 
all models are wrong, but some are useful." 



Models 


Parameter 


Min 


Max 


All 


tl b h 2 


0.018 


0.032 




(l c h 2 


0.04 


0.16 




e 


0.98 


1.1 




T 


0.01 


0.3 




ln[10 1( U s ] 


2.6 


4.2 




^sz 





2 


tilt, tensor, run, tensor+run 


n a 


0.8 


1.2 


tensor, tensor+run 


r 





1 


run, tensor+run 


"run 


-0.1 


0.1 



TABLE I: Prior ranges for the parameters in the different 
models. We considered only uniform priors in this analy- 
sis. The priors on power spectrum parameters are set at 
ko = 0.05 Mpc -1 , the default for CosmoMC, but are so wide 
compared to the posteriors that the subsequent translation to 
the pivot scale is unaffected. 



III. A running model, where both the spectral index and 
the running of the spectrum (n rim ) are allowed to 
vary. 

IV. A tensor model, where the spectral index of the 
scalar perturbations and the tensor-to-scalar ampli- 
tude ratio (r) are allowed to vary. 

V. A tensor+running model, where the spectral index, 
tensor-to-scalar ratio, and running all vary. 

The priors on the parameters in these models are given 
in Table |U 

In this analysis we used measurements of the CMB 
temperature and polarization power spectra from both 
the WMAP 5yr Q and 7yr 0] releases, to explore how 
WMAP has improved in its ability to distinguish between 
different models of the primordial power spectrum. A 
compilation of WMAP 7yr and ground-based CMB ex- 
periments (ACBAR [3, CBI gjl and BOOMERanG 
[2(j), along with the Sloan Digital Sky Survey (SDSS) 
Data Release 7 [2l[ measurements of the galaxy cluster- 
ing power spectrum, was also studied. 

We used nested sampling [22[ to compute the evidence 
values and posterior distributions for the different mod- 
els, making use of CosmoNest [3] and MultiNest [23j as 
additional modules for the CosmoMC [24[ analysis code. 

The evidence values for the various models and differ- 
ent data compilations are given in Table [TTJ We find that 
the tilted model is the favored model for all data compila- 
tions, except for WMAP 7yr+ext where it is tied with the 
tensor+running model within the evidence uncertainties. 
The tilted model is favored over the HZ with strong, but 
not decisive, evidence (as defined in the Jeffrey's scale) 
using the combined data sets. The tensor model is dis- 
favored compared to the HZ using WMAP 5yr, and 7yr 
data alone, but becomes mildly favored when the other 
data is included (though the evidence differences are too 
small to be conclusive) . The running model has approxi- 
mately the same evidence as the HZ for WMAP 5yr and 
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FIG. 1: The posterior probability distributions for the different models, using only the WMAP 5yr data. The models are HZ 
(black), tilted (red), running n s +n run (blue), 'inflation' n s +r (magenta) and tensor+running (green). 





Model 




Datasets 






WMAP 5yr 


WMAP 7yr 


WMAP 7yr±cxt 


I. 


HZ 


0.0 ±0.1 


0.0 ±0.2 


0.0 ±0.2 


II. 


n B 


0.5 ±0.2 


1.0 ±0.2 


3.0 ±0.2 


III. 


T^s ~T~^TUI1 


-0.1 ±0.2 


0.4 ±0.2 


3.4 ±0.2 


IV. 


n s +r 


-1.3 ±0.1 


-0.9 ±0.2 


0.8 ±0.2 


V. 


n s +n run +r 


-1.1 ±0.2 


-0.7 ±0.2 


2.2 ±0.2 



TABLE II: (log-)evidence differences for the different mod- 
els, compared to the HZ model for that data compilation. 
Positive values mean that the model is favored over HZ. The 
unnormalized evidence values for the HZ model are -1346.3 
(for WMAP 5yr), -3754.5 (for WMAP 7yr) and -3834.3 (for 
WMAP 7yr±ext). 



7yr, but this increases to be about the same as the tilted 
model when the extra datasets are added. 

The question of what the probability of n s = 1 (i.e. 
the probability of the HZ model) is one of model selec- 
tion. Combining the evidence values with model prior 
probabilities, we can compute the normalized model pos- 
terior for each of the models, using a normalized version 
of Eq. © 



P{M l \D) = 



P{D\Mj)P{M,) 
E k P(D\M k )P(M k ) 



(7) 



Assuming equal prior probabilities for each of the mod- 
els, P(HZ) = 0.24 for WMAP 5yr, 0.164 for WMAP 7yr 
and 0.016 for WMAP 7yr±cxt. So even WMAP 7yr data 
is not strong enough to exclude an HZ spectrum by itself, 
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FIG. 2: The posterior probability distributions for the different models, using only the WMAP 7yr data. The models are HZ 
(black), tilted (red), running n s +n run (blue), 'inflation' n s +r (magenta) and tensor+running (green). 



but the addition of extra data reduces its model proba- 
bility by a factor of 10, reducing it to less than 2%. 

Other recent papers [H, Q have also computed the 
Bayes factors for models of the primordial power spec- 
trum, using the WMAP 5yr data and other extra data 
sets. Though it is difficult to do a direct comparison 
of raw evidence values between our analysis and others 
(owing to slightly different choices of priors), their basic 
conclusions are the same as ours: tilted models are fa- 
vored over HZ, a tensor model is disfavored relative to 
the others, and running models have roughly the same 
evidence as tilted models when other data is added in. 

The one-dimensional marginalized posteriors for each 
parameter, model and data compilation are shown in 
Figs. [T] (for WMAP 5yr),[2](for WMAP 7yr), and [3] (for 



WMAP 7yr+ext). The posteriors are normalized in the 
usual way, such that the area under each curve is unity. 
For models where a parameter is not varied (such as the 
spectral index in the HZ model), a delta function at the 
appropriate parameter value is the relevant posterior. 

In terms of plotting constraints on the power spectrum 
parameters, the choice of pivot scale k* is important. If 
it is not optimized, marginalized constraints can appear 
much weaker than they actually are. We chose the scale 
that decorrelates the uncertainty on the tilt and the run- 
ning (in the running model), following the method de- 
scribed in Ref. [H]. This scale is found to be 0.013 Mpc" 1 
for the WMAP 5yr and 7yr data and 0.015 Mpc -1 for 
the WMAP+cxt dataset. 

For the Bayesian model averaging, we assume that the 
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FIG. 3: The posterior probability distributions for the different models, using only the WMAP 7yr data plus other datasets. 
The models are HZ (black), tilted (red), running n s +n Iun (blue), 'inflation' n s +r (magenta) and tensor+running (green). 



prior model probabilities are equal. Variation of this as- 
sumption could readily be explored using the quoted evi- 
dence values; for instance one might want to downweight 
HZ as it is not based on a physical model, or models 
with running as inflationary models with large running 
are hard to construct. Model averages are carried out by 
combining the posterior samples from different models 
weighted by the appropriate model probability. 

We do not show a model-averaged result for A s , as we 
did not optimize the pivot scale for the amplitude; in the 
figures one sees that the central amplitude is shifted in 
the running models. This means that the amplitude is 
best determined at some other scale, and a model aver- 
aging should only be carried out at that pivot scale if 
constraining power is not to be lost. In any case one 



can see by eye that model averaging will have little effect 
on the constraints on A s , which is well determined in all 
models. 

The tilt n a is more interesting, as it is not so well de- 
termined due the residual probability that HZ is correct. 
Its model-averaged posterior is shown in Fig. [4j Note 
that in analyses with WMAP data alone the HZ 'spike' 
is prominent in the model-averaged posteriors, containing 
a significant fraction of the posterior probability. Only 
once other data are brought in does its effect become 
small. From the complete data compilation, we find that 
the model-averaged limits on the tilt at the pivot scale 
are 0.940 < n s < 1.000 (95% credible interval), the upper 
limit being precisely at one as it happens to fall within 
the delta-function component from the HZ model. 
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WMAP 5yr 




0.92 0.94 0.96 0.98 
n 



1 .02 1 .04 



WMAP 7yr 




WMAP 7yr+ext 




0.92 



FIG. 4: The model-averaged posterior distributions for the 
spectral index n s , using the WMAP 5yr data only (top), the 
WMAP 7yr data only (middle) and the WMAP 7yr+ext com- 
pilation (bottom). The probability distribution includes a 
delta function around n s = 1, artificially broadened in the 
plot by the binning process. 



Variation of prior assumptions can modify these re- 
sults somewhat. Changes to the prior ranges of param- 
eters common to all models, such as h and r, will have 
no effect, at least as long as the data constrains the val- 
ues to lie well within the prior as it does in these cases. 
Modifying the priors on the parameters that are varied 
only in some models can shift the results. As an example, 
we consider doubling the prior range of n s , to [0.6, 1.4]. 
As the added range fits the data poorly, it has negligi- 
ble likelihood and this halves the evidence of models in 
which n s varies, i.e. their log evidences in Table [TT] are 
reduced by In 2 ~ 0.69, which changes the quantitative 
outcome but not the qualitative one. If one recomputes 
the 95% confidence range of n s under this assumption 
the range is unchanged. Changes to the assumed prior 
model probabilities can be handled similarly. 

Finally, we consider the tensors. As they are not de- 



cs 



0.6 



0.4 



0.2 



■ all 5 models 

. tensor & tensor + n models 



0.2 0.4 0.6 0.8 1 

r 

FIG. 5: The model-averaged cumulative probability distribu- 
tion for the tensor-to-scalar ratio r using the WMAP 7yr+ext 
compilation. The solid curve gives the probability averaged 
over all five models, whereas the dashed curve gives the proba- 
bility averaged just over those models where r is varied (tensor 
and tensor+running). The dotted line shows the 95% credible 
limit. 



tected, model uncertainty can have a significant impact 
by lending support to models in which they are entirely 
absent, i.e. r is precisely zero. If we average over all 
five models, the model-averaged 95% upper limit on the 
tcnsor-to-scalar ratio is r < 0.16 (again at the pivot 
scale). This is indeed somewhat tighter than results from 
individual models (e.g. the equivalent upper limit from 
the 'inflation' model is 0.18) because it allows for the pos- 
sibility of no tensors. Nevertheless, this result is clearly 
highly prior dependent, and would for instance change 
if one decided that a logarithmic prior on r were more 
appropriate. 2 



An alternative tensor limit can be obtained by averag- 
ing only over the two models which permit tensors, which 
gives r < 0.30. The cumulative model-averaged proba- 
bilities for r under both assumptions are shown in Fig. [5] 
It is clear that any upper limit quoted on the tensor frac- 
tion has significant model and prior uncertainty, as well 
as observational uncertainty. 



2 In practice a logarithmic prior on r puts almost all the prior 
model probability at very small r values, yielding results near 
identical to a tensorless model provided tensors are not detected. 
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IV. CONCLUSIONS 

We have illustrated the methodology of Bayesian 
model averaging using primordial power spectrum esti- 
mation. From a Bayesian viewpoint, the purpose of all 
data analysis is to start with prior information, perhaps 
entirely theoretically motivated, and take data of increas- 
ing quality until the data likelihood is able to convinc- 
ingly overcome the prior uncertainty. Bayesian model av- 
eraging allows us to include the prior model uncertainty 
as well as the prior parameter uncertainty in this pro- 
cess, and hence offers a more complete incorporation of 
theoretical uncertainties. 

Presently data arc unable to decisively distinguish 
amongst different models for the primordial perturba- 
tions, with all five that we discuss remaining viable at 
some level. Despite that, parameters such as the spectral 
amplitude that are very accurately measured are quite 
unaffected by model uncertainty. Parameters moderately 
well determined, such as n s , can see significant effects 
from model uncertainty, while undetermined parameters 
such as r are naturally the most sensitive to the increased 
incorporation of uncertainties. 

Bayesian Model Averaging can be used beyond cos- 
mological models to any problem in cosmology or astro- 
physics where the underlying model is uncertain. This 
may be uncertainty as to the nature of the physical object 
(e.g. unresolved galaxies of different species contributing 
to a background) or with regards to the data analysis 



(e.g. supernova light curve analysis, where a number of 
different light curve fitters arc available). 

As with any Bayesian analysis, the results will have 
some dependence on the choices of priors, including the 
model prior probabilities. We should not be afraid of 
this; the opportunity to choose appropriate priors is our 
chance to deploy our physical intuition. Readers who 
prefer different priors are welcome to recalculate if they 
wish; this is particularly easy for model priors as the 
evidence ratios we quoted are all that is needed. We also 
briefly discussed a modification of parameter priors; a 
full analysis should explore the reasonable range of prior 
possibilities. Only by combining the full range of prior 
uncertainties, at both parameter and model level, with 
observational uncertainties can one obtain the full picture 
of current understanding. 
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